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Abstract — The capacity per unit cost, or equivalently minimum 
cost to transmit one bit, is a well-studied quantity. It has been 
studied under the assumption of full synchrony between the 
transmitter and the receiver. In many applications, such as sensor 
networks, transmissions are very bursty, with small amounts of 
bits arriving infrequently at random times. In such scenarios, 
the cost of acquiring synchronization is significant and one is 
interested in the fundamental limits on communication without 
assuming a priori synchronization. In this paper, we show that the 
minimum cost to transmit B bits of information asynchronously 
is {B + H)kBync, where fcync is the synchronous minimum cost 
per bit and If is a measure of timing uncertainty equal to the 
entropy for most reasonable arrival time distributions. 



I. Introduction 

Synchronization is an important component of any commu- 
nication system. To understand the cost of synchronization on 
communication performance, it is helpful to divide applica- 
tions into two rough types. In the first type, transmission of 
data happens on a continuous basis. Examples are voice and 
video. The cost of initially acquiring symbol synchronization, 
say by sending a pilot sequence, is relatively small in such 
applications because the cost can be amortized over the many 
symbols transmitted. In the second type, transmissions are 
very bursty, with very small amounts of data transmitted once 
in a long while. Examples are sensor networks with sensor 
nodes transmitting measured data once in a while. The cost 
of acquiring synchronization is relatively more significant in 
such applications because the number of bits transmitted per 
burst is relatively small. 

What is the fundamental limitation due to the lack of a 
priori synchrony between the transmitter and the receiver in 
bursty communication? While there has been a lot of research 
on specific synchronization algorithms, this question has only 
recently been pursued JU, 15]. In their model, transmission of 
a message starts at a random time unknown to the receiver 
The performance measure is the data rate: the number of bits 
in the message divided by the elapsed time between the instant 
information starts being sent and the instant it is decoded. 

The data rate is a sensible performance metric for bursty 
communication if the information to be communicated is 
delay-sensitive. Then, maximizing the data rate is equivalent 
to minimizing the time to transmit the burst of data. In many 
applications, however, the allowable delay may not be so 
tightly constrained, so the data rate is less relevant a measure 
than the energy needed to transmit the information. In this 
case, the minimum energy needed to transmit one bit of 
information is an appropriate fundamental measure. Thus, we 



are led to ask the following question: what is the impact of 
asynchrony on the minimum energy needed to transmit one 
bit of information? 

This type of question falls into the general framework 
of capacity per unit cost ||3], ||6l, where one is interested 
in characterizing the maximum number of bits that can be 
reliably communicated per unit cost of using the channel. 
Consider the following modification of the formulation in [41, 
|5| to study asynchronous capacity per unit cost. 

There are B bits of information which needs to be com- 
municated. The number B can be viewed as the size of a 
burst in the above scenario, with consecutive bursts occurring 
so infrequently that we can consider each burst in complete 
isolation. The B bits are coded and transmitted over a mem- 
oryless channel using a sequence of symbols that have costs 
associated with them. The rate R per unit cost is the total 
number of bits divided by the cost of the transmitted sequence. 

The data burst arrives at a random symbol time v, not known 
a priori to the receiver Without knowing v, the goal of the 
receiver is to reliably decode the information bits by observing 
the outputs of the channel. Although the receiver does not 
know V, we assume that both the transmitter and the receiver 
know that v lies in the range from 1 to A. The integer A 
characterizes the asynchronism level or the timing uncertainty 
between the transmitter and the receiver At all times before 
and after the actual transmission, the receiver observes pure 
noise. The noise distribution corresponds to a special 'idle 
symbol' • being sent across the channel. 

The main result in the paper is a single-letter characteri- 
zation of the asynchronous capacity per unit cost, or, equiv- 
alently, the minimum cost to transmit one bit of information. 
Under the further assumption that the idle symbol • is allowed 
to be used in the codewords and has zero cost, the result 
simplifies and admits a very simple interpretation: for B 
large, the minimum cost to transmit B bits of information 
asynchronously is: 



where k^ 



{B + \og A)k, 



is the minimum cost to transmit one bit of 



n 
information in the synchronous setting. [^ Thus, the timing 

uncertainty imposes an additional cost of ksync log A as com- 
pared to the synchronous setting. Note that this result implies 
that the additional cost is significant only when A is at least 
exponential B. 

'in this paper, all logarithms are taken to base 2. 



Even though we do not have a stringent requirement on the 
delay from the time of data arrival to the time of decoding, a 
meaningful result cannot be obtained if there is no constraint 
at all. This can be seen by noting that the transmitter could 
always wait until the end of the arrival time interval (at time 
A) to transmit information. Then, there would no price to pay 
for the timing uncertainty, but the delay incurred would be 
very large, exponential in B. In contrast, the above result can 
be achieved by a coding scheme whose delay is much shorter, 
linear in the number of bits B, and we show that performance 
cannot be improved within the broad class of coding schemes 
whose delays are sub-exponential in B. A delay linear in B 
is of the same order as the delay incurred in the synchronous 
setting f6l. This means that the start time of information trans- 
mission is highly random to the receiver and the additional cost 
is the cost needed to construct codewords that allow a decoder 
resolve this uncertainty. More generally, we also show that 
when the allowable delay D scales exponentially with B (but 
no larger than A), the minimum cost to transmit B bits can 
be further reduced to: 

A 



-^^ 



TJV 



B + log— ) ^sync- 



The above results are all proved under a uniform distribution 
on the arrival time v. They can be generalized to a broad class 
of other distributions, with log^ replaced by a quantity H 
which equals the entropy for most reasonable distributions. 

II. Model and Performance Criterion 

Our model captures the following features: 

• Information is available at the transmitter at a random 
time; 

• The transmitter chooses when to start sending informa- 
tion; 

• Outside the information transmission period, the trans- 
mitter stays idle and the receiver observes noise; 

• The receiver decodes without knowing the information 
arrival time at the transmitter 

Communication is discrete-time and carried over a discrete 
memoryless channel characterized by its finite input and output 
alphabets XU{*} and y, respectively, and transition probability 
matrix Q{y\x) for all a; G XU {•} and y G y. Here • denotes 
the special idle symbol and X denotes the alphabet containing 
the symbols that can be used in the actual transmission of the 
data. X may or may not contain *. 

Given B information bits to be transmitted, a codebook 6 
consists of M = 2^ codewords of length N composed of 
symbols from X. The message m arrives at the transmitter at 
a random time v, independent of m, and uniformly distributed 
over {1,2, ...,yl}, where the integer A>1 characterizes the 
asynchronism level between the transmitter and the receiver 
Only one message arrives over the period [1,2,..., A+N —\\. 
If A = l, the channel is said to be synchronous. 

The transmitter chooses a time (j{v,m) to begin transmitting 
the codeword c{'m) S 6 assigned to message m — the transmit- 
ter need not start sending information right at the time when 
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Fig. 1 . Time representation of what is sent (upper arrow) and what is received 
(lower arrow). The '*' represents the 'idle' symbol. Message m arrives at time 
V, starts being sent at time a, and decoding occurs at time r. 



the message is available, i.e, at time v. The only constraint a 
must satisfy is that 

u < a{v, m) < A , 

i.e., the transmitter cannot start transmitting before the mes- 
sage arrives or after the end of the uncertainty window. In the 
rest of the paper, we suppress the arguments ly and m of a 
when these arguments are clear from context. 

Before and after the codeword transmission, i.e., before 
time a and after time a + N — I, the receiver observes 
'pure noise,' Specifically, conditioned on the event {v = t}, 
t E {1,2,..., A}, and on the message to be conveyed m, the 
receiver observes independent symbols 

Yi,Y2, ■ ■ ■ , Ya+w-i 

distributed as follows. For 1 < i < (j{t,m) — 1 or a{t,m) + 
N < i < A + N — 1, the F/s are distributed according 
to Q(-|*). At any time i e {a, cr + 1, . . . , cr + A^ - 1}, the 
distribution is 

Q(-|Cj-o-+l(m)), 

where Ci{m) denotes the ith symbol of the codeword c{m). 

Knowing the asynchronism level A, but not the value of 
v, the receiver decodes by means of a sequential test (r, 0), 
where r is a stopping time, bounded by A+N—1, with respect 
to the output sequence Yi,Y2,. .. indicating when decoding 
happens, and where (p denotes a decision rule that declares 
the decoded message (see Fig. [TJ. Recall that a (deterministic 
or randomized) stopping time r with respect to a sequence 
of random variables Yi,l2, ■ • • is a positive, integer-valued, 
random variable such that the event {r — t}, conditioned 
on the realization of Yi,Y2, . . . ,Yt, is independent of the 
realization of Yf+i, ^4+2, • • ■, for all i > 1. Given {t = t}, 
t £ {1,2, . . . ,A + N ~ 1}, the function outputs a message 
based on the past observations from time 1 up to time t|3 

A 'code' refers to a codebook 6 together with a decoder, 
i.e., a sequential test (r, (p). Throughout the paper, whenever 
clear from context, we often refer to a code using the codebook 
symbol C only, letting the decoder implicit. 

The maximum (over messages) decoding error probability 
for a given code C is defined as 



1 ^ 
P(£|e)-max-VP™.t(£), 

771 Ji ^ ^ 



(1) 



-To be more precise, 4> is any 3^^ -measurable function that takes value in 
the message set, where St is the sigma field generated by Yi,Y2, ■ ■ ■ ,Yt. 



where the subscripts 'm/ indicate conditioning on the event 
that message m arrives at time v — t, and where £ indicates 
the event that the decoded message does not correspond to the 
sent codeword. 

Definition 1 (Cost Function). A cost function fe : X — > [0, cxd] 
assigns a non-negative value to each channel input. 

Definition 2 (Cost of a Code). The (maximum) cost of a code 
C is defined as 

N 

K{e) =maxVfe(c,(m)). 

i=l 

Definition 3 (Delay of a Code). Given e > 0, the (maximum) 
delay of a code G, denoted by D(G,e), is defined as the 
smallest d such that 

minP„(r-i/ < d) > 1 - e, 

m 

where Vm denotes the output distribution conditioned on the 
sending of message mQ 

A key parameter we shall be concerned with is 
^logA 
^ B ' 

which we call the timing uncertainty per information bit. 

Next, we define the asynchronous capacity per unit cost in 
the asymptotic regime where _B — > oo while /3 is kept fixed. 

Definition 4 (Asynchronous Capacity per Unit Cost). R is 
an achievable rate per unit cost at timing uncertainty per 
information bit /3 and delay exponent d if there exists a 
sequence of codes {Gb}, ond a sequence of numbers {eb} 
with sb — >■ 0, such that 

P(£|eB)<£s, 

limsuplog(D(eB,£i3))/B <5, 

B-»-oo 



and 



lim inf — 

B^oo K{Gb) 



> R. 



The asynchronous capacity per unit cost, denoted by C{f3, S), 
is the largest achievable rate per unit cost. In the important 
case when S = 0, we define C{/3) — C{/3, 0)o 

Note that, in Definition H] the codeword length iV is a free 
parameter that can be optimized, just as for the synchronous 
capacity per unit cost (see comment after ||6] Definition 2]). 
The results in the next section characterize the capacity per 
unit cost for arbitrary /3 and S and arbitrary alphabet X. 
Similarly as for the synchronous case, the results simplify 
when there is a zero cost symbol, specifically when X contains 
• and • has zero cost. 

'Hence, by definition we have 

1 ^ 



III. Results 

Our first result gives the asynchronous capacity per unit cost 
when S = 00 It can be viewed as the asynchronous analogue 
of Theorem 2 in (|6], which states that the synchronous 
capacity per unit cost is 



I{X;Y) 
f"" E[kiX)] 



(2) 



Theorem 1 (Asynchronous Capacity per Unit Cost: Sub-ex- 
ponential Delay Constraint). The asynchronous capacity per 
unit cost at delay exponent S ^ is given by 

C(/y)-m^xmm|j^j^^^^j, ^j^^^^j^^^^) |, (3) 

where X denotes the random input to the channel, Y the cor- 
responding output, Yi, the random output of the channel when 
the idle symbol • is transmitted (i.e., Yi, ^ Q{:\*)), I{X;Y) 
the mutual information between X and Y, and D{Y\\Yi,) 
the Kullback-Leibler distance between the distributions of Y 
and Y^E 

Furthermore, capacity can be achieved by codes whose 
delay grows linearly with B. 

The two terms in Q reflect the two constraints on reli- 
able communication. The first term corresponds to the stan- 
dard constraint that the number of bits that can reliably be 
transmitted per channel use cannot exceed the input-output 
mutual information. This constraint applies when the channel 
is synchronous, hence also in the absence of synchrony. To see 
this, note that by swapping the max and the min in (O, we 
deduce that C(/3) is less than dU, the synchronous capacity 
per unit cost. 

The second term in (O corresponds to the receiver's ability 
to determine the arrival time ly of the data. Indeed, even 
though the decoder is required only to produce a message 
estimate, because of the delay constraint there is no loss in 
terms of capacity per unit cost to also require the decoder 
to (approximately) locate the time codeword transmission 
begins — the delay constraint imposes the decoder to locate the 
sent message within a time window that is negligible compared 
to A. The quantity I{X; Y)+D{Y\\Yi,) measures how difficult 
it is for the receiver to discern a data-carrying transmitted 
symbol from pure noise and thus determines how difficult it 
is for the receiver to get the timing correct. 

When the alphabet X contains a zero-cost symbol 0, the 
synchronous result (|2|i simplifies, and Theorem 3 in f6l says 
that the synchronous capacity per unit cost becomes 

D(Y^\\Yo) 



max ■ 



k{x) 



(4) 



an optimization over the input alphabet instead of over the 
set of all input distributions, where Y^. refers to is the output 
distribution given that x is transmitted. 

We find an analogous simplification in the asynchronous 
setting when • is in X and has zero cost: 



^Throughout the paper, we assume that Q has non-zero capacity, for 
otherwise the capacity per unit cost and, a fortiori, the asynchronous per 
unit cost, equal to zero. 



'Throughout the paper we use the standard 'big O'notation to characterize 
growth rates. 

^Kt can be inteipreted as 'pure noise.' 



Theorem 2 (Asynchronous Capacity per Unit Cost With Zero 
Cost Symbol: Sub-exponential Delay Constraint). If-k is in X 
and has zero cost, the asynchronous capacity per unit cost at 
delay exponent 5 = is given by 



CW) = 



1 



1 + 13 "xix k{x) 



■ max 



(5) 



and capacity can be achieved by codes whose delay grows 
linearly with B. 

Hence, a lack of synchronization multiplies the cost of send- 
ing one bit of information by 1 + /3. An intuitive justification 
for this is as follows. Suppose there exists an optimal coding 
scheme that can both isolate and locate the sent message 
with high probability — as alluded to above, the ability to 
'locate' the message is a consequence of the decoder's delay 
constraint. This allows us to consider message/location pairs 
as inducing a code of size w -^2^ used for communication 
across the synchronous channel. Hence, if, say, N grows 
sub-exponentially with B, we are effectively communicating 
K, (3B + B = B{1 + /3) bits reUably over the synchronous 
channel. Therefore, sending B bits of information at asyn- 
chronism level /3 is at least as costly as sending B{1 + /3) bits 
over the synchronous channel. Flipping this reasoning around, 
the asynchronous channel effectively induces a codebook for 
message/location pairs where the location is encoded via PPM. 
From 161, optimal coding schemes are similar to PPM in that 
the codewords consist almost entirely of the zero cost symbol. 
This provides an intuitive justification for why (1 + /3)ksync is 
an achievable rate per unit cost. 

Theorem |2] can be extended to the (continuous-valued) 
Gaussian channel, where the idle symbol * is the 0-symbol: 

Theorem 3 (Asynchronous Capacity per Unit Cost for the 
Gaussian Channel: Sub-exponential Delay Constraint). The 
asynchronous capacity per unit cost for the Gaussian channel 
with variance No/2, quadratic cost function (i.e., k{x) = x^), 
and delay exponent 5 — 0, is given by 



C{(i) 



1 lege 

l + P No 



^>0. 



Theorem [T] can be extended to the case of a large delay 
constraint, i.e., when < 5 < /3. As for Theorem [T] the 
following result holds irrespectively of whether or not X 
contains *. A simplification similar to Theorem |2] applies if X 
contains * and it has zero cost. 

Theorem 4 (Asynchronous Capacity per Unit Cost: Exponen- 
tial Delay Constraint). The asynchronous capacity per unit 
cost at delay constraint 5, with < 5 < /3, is given by 

C{f3,S)^C{l3-S), 

i.e., it is the same as the capacity per unit cost with delay 
exponent 6^0, but with asynchronism exponent /3 reduced to 

(3-6. 

The uniform distribution on i/ in our model is not critical. 
The next result extends Theorem [T] to the case where v is 
non-uniform. For a non-uniform distribution on i^, what is 



important turns out to be its 'smallest' set of mass points that 
contains 'most' of the probability. 

Below, i/^ denotes the arrival time random variable when 
B bits of information have to be transmitted (In Theorem [T] 
i^^ has the uniform distribution over {1, 2, . . . , 2^^}). 

Theorem 5 (Asynchronous Capacity per Unit Cost With 
Non-uniform Arrival Time: Sub-exponential Delay Con- 
straint). Define 



j3 = inf lim 

{<lb} S->oo 



B 



(6) 



where the infimum is with respect to all sequences {es} of 
nonnegative numbers such that lims-i-oo es = 0, where S{eB) 
denotes the size of the smallest set with probability at least 
1 — cb, and it is assumed that the limit in (|6]l exists. 

Then, the asynchronous capacity per unit cost at delay 
exponent is given by 



C{P) = maxmin 

X 



I{X-Y) I{X-Y)+D{Y\\Y,) 



Although the formula for /? in (|6ll appears unwieldy, in many 
cases it can easily be evaluated. For example, in many cases, 
such as for the uniform or the geometric distributions, the 
formula reduces to the normalized entropy 

/3= lim H{v^)/B. 

There are cases, however, where ^ doesn't reduce to the 
normalized entropy. For instance, consider the case when 
v^ — 1 with probability 1/2, and v^ — i with probability 

(1/2)2-^-^ for i = 2, . . . , 2'^^ + 1. Then, 

/3 = 2 lim H{v^)/B. 

B— s-oo 

IV. Proofs of Results 

Achievability of Theorem |7} We first show the existence 
of a random code with the desired properties. Then, via an 
expurgation argument, we show the existence of a determinis- 
tic code achieving the same (asymptotic) performance as the 
random code. 

Fix some arbitrary distribution P on X. Let X be the input 
having that distribution and let Y be the corresponding output. 
Given B bits of information to be transmitted, the codebook 
6 is randomly generated as follows. For each message m G 
{1,2,. ..,2-^}, randomly generate a length N sequence x^ 
i.i.d. according to P. If x^ satisfies the following 'constant 
composition' property 

Otherwise, we repeat the procedure until 
we generate a sequence sufficiently close to P. It is not hard 
to see that, for a fixed m, no repetition will be required to 
generate c{m): the constant composition property holds with 
probability tending to one as A^ ^ cxd. The obtained codebook 
is thus essentially of constant composition, i.e., each symbol 
appears roughly the same number of times across codewords. 



we let c{m) 



r.N 



The sequential typicality decoder operates as follows. At 
time t, for all m e {1,2,. ..,2^}, the typicality decoder 
computes the empirical distributions 



Pr 



.(•'•) 



induced by C{m) and the N output symbols Yj*_^^]^|j If there 
is a unique message m for whicqj 

\\Pcirr.).yl_,J;-) ~ P{-)Q{-\-)\\ <l/hgN, 

the decoder stops and declares that message m was sent. If 
more than one codeword is typical, the decoder stops and 
declares one of the corresponding messages at randomO If 
no codeword is typical at time t, the decoder moves one step 
ahead and repeats the procedure based on ^4*^*^+2- 

Suppose message m is transmitted. The error event that the 
decoder declares some message w! ^ m can be decomposed 
into the union of two error events: 

• £ 1 : the decoder stops at time t between v and j/ + 2iV — 2 
(including v and v + IN — 2), and declares m'; 

• £2: the decoder stops either at some time t before v or 
from V + 27V — 1 onwards, and declares m! . 

For the first error event, for some < fc < A^ — 1 the first 
or the last k symbols of Y^ are generated by noise, and the 
remaining N — k symbols are generated by the sent codeword. 
The probability that such a F^ yields an typ4lj J that is 
jointly typical with P{-)Q{-\-), that is 

||j(.,.)-P(-)Q(-|-)ll<i/iogiv, 

is upper bounded as 
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p(r^ = 


y^) 
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(7) 

for any e > and all N large enough, where H{Jx) denotes 
the entropy of the left marginal of J, 

H{Jx\^) = -^ Jy(6)^ Jx|y(a|6)logJx|y(a|^), 



b£\ 
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'We use here capital letters to denote codewords to emphasize that they 
are randomly generated. 

^11 • II refers to the Li-norm. 

The notion of typicahty we use is often referred to as 'strong typicality' 
in the literature. 

Recall that the type of a string y £ ^ , denoted by -P„jv, assigns to 
each a € y a probability that corresponds to the frequency of occurrences of 
a within y" (2J Chapter 1.2]. For instance, if y^ = (0,1, 0), then P^a (0) = 
2/3 and P„3(l) = 1/3. The joint type induced by a pair of strings x g 
X"',y^ G cy" is defined similarly. 



and /(J) denotes the mutual information induced by J. 

The first equality in (|7]) follows from the independence of 
C^ {m!) and Y^ since Y^ corresponds to the output of 
C^ {m). For the first inequaUty, note that if the codewords 
were randomly generated with each component of each code- 
word i.i.d. according to P, we would get from tJj Theorem 
11.1.2, p. 349] 

Since a codeword with each component generated i.i.d. ac- 
cording to P satisfies the constant composition property with 
probability tending to one as A^ — > 00, we get 

P(X^ = x^) = 2-^(^(^^^)+^('^^--ll^»(l + 0(1)) 

as A^ ^ cxD, which justifies the second inequality. The third 
inequality follows from ||2l Lemma 2.5, p. 31]. The fifth 
inequality holds for any e > and all A" large enough, since 
by assumption ,/ is close to PQ. 

Thus, taking a union bound over time (IN), we obtain the 
upper bound 

P,„(£i)<2Ar.2-^(^(^^^)-^'), 

valid for A^ large enough. 

For the second error event, pure noise produces some 
output Y^ that is jointly typical with C{m'). From Sanov's 
theorem and the continuity of the Kullback-Leibler distance 
(needed only in the first argument) the probability of this event 
happening at a particular time is upper bounded by 

2-'N{D{XY\\X.Y,)-e) ^ 2-^(-'"(-^''*')+-°(^ll^*)-^) 

for all A^ large enough. Here D{XY\\X,Y^,) refers to the 
Kullback-Leibler distance between, on the one hand, the joint 
distribution of X and Y and, on the other hand, the product 
of the distributions of X and y^. Hence, taking a union bound 
over all times where noise could produce such output, we get 

P,„(£2)<A.2-^(^(^-^)+^(^ll^*)-^) 

for all A^ large enough. 
Combining, we get 

P„(m ^ m') < 2A^ • 2-^(^(^-^)-^) 

j^ J^ , 2-N{I{X;Y) + D(PY\\PYj~e) _ 

Hence, taking a union bound over all possible wrong 
messages, we obtain for all e > 

P„(£)<2^(2A^.2-^(^(^^^)-^) 

for TV large enough and all m. Since the above bound is valid 
for a randomly generated code we deduce that 

Ee(P(£|e))=P„(£) 

<2^f2A^-2-^(^(^'^)-^) 



^ ^ . 2-w(/(-^;>')+£'(Pv||i^vJ-s)'j 
= £i{N) (8) 



where P denotes probability averaged over messages. We now 
show that the delay of our coding scheme in the sense of 
Definition |3] is N. If t > 1/ + N then necessarily it means that 
y^f+^^^ isn't typical with the sent codeword. From Sanov's 
Theorem, the probability of the latter event tends to zero 
as A^ ^> cxo. Therefore, denoting by I){G,e2{N)) the delay 
averaged over messages}^ we get 



and 



^{D{e,e2{N)) < N) ^ 1 - e2{N) . 



(9) 



where e2{N) is a function that tends to zero as A^ — > cxd. 

So far we have proved that a random code has error 
probability, averaged over messages, less than ei{N) and 
delay, averaged over messages, less than N with probability 
at least 1 — e2{N). Moreover, all codewords in our random 
ensemble have cost NE[k{X)]{l + o(l)) as iV ^ 00. 

We now show the existence of a non-random code with 
maximum delay close to N and maximum error probability 
within a constant factor of ei{N) via a two-step expurgation 
procedure over codes and messages. 

Define the events 

^i-{P(e|e)<£i(iv)(i + r,i)} 

A2 = {D{e,e2{N))<N} 

where 771 > is some arbitrary constant, and where ]P(£|C) 
denotes the average, over messages, error probability given 
code 6 is used. We then deduce that 

Pfyiinyia) >1--^ e2iN) (10) 

V / 1 + ??i 

where we used Markov's inequality to establish that 

1 



P(^i) < 



From ( fTOb . for all N large enough there exists a non-random 
code 6 whose average, over messages, delay and error prob- 
ability satisfies 

D{e,e2{N))<N, 



and 



P(£|e) <ei(A^)(l + ?7i) 



respectively. We now strengthen the above conclusion for 
maximum delay and error probability. 

For an arbitrary small 772 > 0, remove all the codewords 
in 6 with delay larger or equal than (1 + r]2)N . From the 
remaining set of codewords, at least 

2^(1-1/(1 + 772)) 

of them, remove the half of the codewords with the highest 
error probability. This final set C contains at least 2^^^(1 — 
1/(1 + 772)) codewords and has maximum error probability 
and maximum delay upper bounded as 

2(1 + m) 



F(£|e') <ei(A^) 



1-1/(1 + 772)^ 



(11) 



'i.e., D{Q, e) denotes the smallest d > such that 
^ Y.fm(^T~r,<d)>l-e 



D{e',e2{N))<N{l + rj2). 



(12) 



respectively. 

Finally, recall that by construction, all the codewords have 

cost NE[k{X)]{l + 0(1)) as iV ^ 00. Hence, for N large 
enough 



fc(e') < NE[k{X)]{l + r]2) 



(13) 



Therefore, from (fTTT l. (fTZt . and ( fTTt , our non-random code C 
has an error probability that tends to zero as Si{N) vanishes, 
a (maximum) delay upper bounded by A^(l + 772), and a cost 
at most E[fc(X)] (1+772). 

Now fix the ratio B/N, thereby imposing a delay linear in 
B, and substitute A = 2^^ in the definition of ei{N) (see 
dSJ). Then, P(£|e') goes to zero as i? ^ 00 provided that 



B 

N 



<mmU{X:Y). 



IiX;Y) + D{Y\\Y,) 
1 + /3 



Since the cost of C is at most NE[k{X)]{l + 772), the above 
condition is implied by the following condition 



B 



< mm 



I{X;Y) 



I{X;Y) + D{Y\\Y,) 



K{e') { (1 + 772)E[fe(X)] ' E[k{X)]il + r]2){l + /; 

Maximizing over all input distributions and using the fact 
that 771 > and 772 > are arbitrary proves that (O 
is asymptotically achieved by non-random codes with delay 
linear in B. 

From the above analysis it is easy to see that whenever there 
exists some input X such that I{X; Y) > while E[fc(X)] = 
0, and thus X contains more than one zero cost symbol, the 
asynchronous capacity per unit cost is infinite. ■ 

Achievability of Theorem ^ The achievability scheme 
for Theorem |4] is similar to the achievability scheme used 
to prove Theorem [T] The only difference is that now the 
transmitter does not start transmitting at time v. Instead, the 
transmitter first reduces the receivers's time uncertainty about 
the beginning of codeword transmission by waiting to transmit 
until the first multiple of 2^^ larger than v. This effectively 
reduces the uncertainty about codeword transmission from 2^^ 
to 2^'^^'^^^, and one proves that C(/3 — 5) is achievable with 
delay 0{2^^) by repeating the arguments of the achievability 
of Theorem [T] by also fixing the ratio B/N. Hence, the 
blocklength is exponentially smaller than the delay. This is 
in contrast with the achievability of Theorem [T] where delay 
and blocklength are the same (information transmission starts 
at the same time that information arrives, i.e., cr(j/, m) = t^). 

■ 

Achievability of Theorem \5} To prove the achievability 
part of Theorem |5] one applies the same arguments as for 
the achievability of Theorem [T] by replacing the uncertainty 
set A = {1,2, . . . , A} hy a 'typical' set of size S{eB) whose 
probability, under the arrival time distribution, is at least 1— es 
(such as set exists by assumption). Note that the event when 
the arrival time 1/ doesn't belong to the typical set affects 
neither the cost nor the delay (asymptotically). ■ 

Converse of Theorem \J} Recall that delay refers to the 
elapsed time between v and r, and need not coincide with the 



,S/(^' 2/(^' 



.,('■) 



Fig. 2. Parsing of the entire received sequence of size A 
blocks 2/(1^ y (^' , . . . , y ("■) of length (Ib ■ 



1 into r 



codeword length N. Throughout this proof it is convenient 
to refer as 'codeword' the sequence of symbols transmitted 
from time i/ until time i/ + d ~ 1, where d is the achieved 
(maximum) delay. (Hence, with this definition a codeword is 
mostly composed of • if A^ <C d.) Accordingly, a codebook 
represents now a collection of the newly defined codewords. 
Assume that {Cb} achieves a rate per unit cost i? > at 
timing uncertainty per unit cost /3 and delay exponent (5 = 0. 
To simplify notation, we denote the delay D{Gb,£b), where 
Eb — > 0, by dB- By assumption (we consider Theorem [U, 
the delay exponent is zero, i.e. 



lim sup ■ 

B- 



B 



0. 



We now show that for any rj > 0, R and (3 satisfy 



and 



RE[k{X)] <I{X;Y){l + r]) 



RE[k{X)]{l + 13) <D{XY\\X,Y^)+ri 



(14) 



(15) 



for B large enough, where X ^ Pb, and where Pb denotes 
the type containing the most codewords from CrPI 

First we prove (fl4l l. Let C'g be the constant composition 
subset of Gb with the largest number of elements, and let 
Pb be corresponding type. C'g is clearly a good code for the 
synchronous channel, i.e., if we reveal u to the receiver and 
decoding happens at time ;/ + ds — 1, it is possible to achieve 
error probability less than Eb- It follows from fT, Lemma 1.4, 
p. 104] that for any ij > 0, 



logiei 



<I(X;Y){l + r,) 



(16) 



for all B large enough. Then, to obtain (fl4] i we use the fact that 
the number of types grows polynomially with ds [T, Lemma 
2.2, p. 28], and that the cost of 6^ is equal to ds • E[k{X)]. 
To prove ( fTsT i, let us reveal the complete output sequence 

yi,y2,---,yA+N-i 

to the receiver, and that the message was sent in one of the 

A + dB — {i' mod dB) 
dB 

consecutive (disjoint) blocks of duration ds as shown in Fig.|2] 
With this additional knowledge, the optimal MAP decoder is 
able to simultaneously output the sent codeword and the block 
of size dB corresponding to the actual transmission period, 
with probability at least 1 — 2eb > 0. To see this, note that 
Gb achieves error probability eb, and that the (maximum) 

'^Recall that the right argument of the Kullback-Leibler on the right-side 
of (15) refers to the product of the distributions of X and Kt ■ 



communication delay is less than d^ with probability at least 
1 -Eb- 

To develop some intuition for (flST l. first consider the case 
where the sent codeword c{m) is known at the decoder, so 
that the decoder's task is only to output the block of size 
dB that corresponds to the period when the codeword is sent. 
We show that if (3 is sufficiently large, the decoder will not be 
able to perform this task reliably, because the noise is likely to 
produce several blocks that look as though they were generated 
by c{m). More precisely, we show that the MAP decoder has 
a large probability of error whenever for some rj > and all 
B large enough. 



Bl3>dB{D{XY\\X,Y,)+r]). 



(17) 



First note that the MAP decoder will fail with probability at 
least 1/2 whenever a pure noise block of dB output symbols 
induces the same joint type with c{m) as the block of output 
symbols Y^,, . . . , Y^^ds-i- Since c{m) has type Pb, the joint 
type of c{m) and Y^, . . . ,Y^^dB-i is close to PbQ with 
probability 1 — o(l), i.e., F^, . . . , Yyj^ds-i is (strongly) typical 
with c{ra) with high probability. Therefore, to show that the 
MAP decoder fails with probability bounded away from 0, it 
suffices to show that for any conditional type Q ~ Q, with 
high probability there exists at least one pure noise block that 
induces the joint type PbQ with c{m). Here Q Ki Q means 
that 

\\PbQ-PbQ\\ <l/\ogdB- 

From standard results in large deviations, the probability 
that one single pure noise block induces the joint type PbQ 
with c{m) is = 2'~'^^'^'^^^\\^'^*\ where Y denotes the 
channel output when X is the input to Qlj Therefore, the 
number of pure noise blocks that induce the joint type PbQ 
with c{m) is a binomial random variable with mean 

^ _2-<^bD(XY\\X.Y„) ^ ^ 2-<^bD{XY\\X,Y„) 

dB dB 

where for the second equality we used that Q k, Q. Hence, 
since ds is sub-exponential in B, that A = 2'^^, from ( fTTb it 
follows that the mean grows at least as 2^'^, for some (5 > 0. 
Since the variance of a binomial random variable is at most its 
mean, Chebyshev's inequality implies that with probability 1 — 
o(l) there is at least one pure noise block that induces the joint 
type PbQ with c{m). Therefore, the MAP decoder fails with 
probability at least 1/2 — o(l) whenever ( [TtI i holds. Hence, 
whenever there is only a single message and the decoder's 
only task is to locate it, necessarily we have 

BP < dB{D{XY\\X,Y^) + ri) 

for any ry > and all B large enough. 

We now extend the above argument to obtain inequality 
(fTsT l. To obtain ( fTsT i we use the fact that the decoder does 
not know a priori the transmitted message. Because of this. 



'^We use the notation f{B) 
exponentially equal, i.e., if 



: g{B) whenever the functions / and g are 



lim 



7J log2 f{B) 



lim — log2 3(B). 
B—>-oo B 



the decoder's task is more difficult to perform; pure noise can 
induce an error whenever it generates a block that is typical 
with any of the messages. The key element in our analysis 
consists in showing that the 'typicality' regions are essentially 
disjoint. This, together with the above argument for the single 
message case will yield the desired result. 

First, note that if Gb achieves a low error probability on 
the asynchronous channel, then it must also achieve a low 
error probability on the synchronous channel — if we reveal 
v to the decoder, the channel becomes synchronous and the 
error probability can't increase. In turn, if 6^ achieves a 
low error probability on the synchronous channel Q, then the 
(strong) typicality regions associated to the codewords must 
have 'small overlap.' Formally, it can be shown that there 
exists a subset C'g C Cb with the following properties; 

a. logie'sl =log|ei3| -o(dB); 

b. Every c{m) G C'^ has type Pg; 

c. For each c{m) e C^ and each Q ~ Q, there exists 
S(to, Q), a subset of the set of output sequences that 
induce the joint type PbQ with c{m). For fixed Q, the 
sets S(m, Q) are disjoint across messages, and their union 
has at least \Q'g\2'^BH{Y\x)-o(dB) sequences. 

Properties a and 6 are technical constraints that simplify the 
proofs. Property c formally captures the notion that because 
Cs achieves low error probability on the synchronous channel, 
the strong typicality regions cannot overlap much. Note that 
for a fixed c{m), the set of output strings that induce the joint 
type PbQ with c{m) has size 2'*B^(^l'^)-°('iB). Therefore, c 
says that the size of the union is essentially maximal. 

We first we show how ( fT5] l follows from the existence of a 
subset of codewords 6^ C Qb possessing properties a-c. To 
do this, we essentially mimic our argument for the case where 
the decoder knows c{m). 

As for the single message case, let us parse the output 
sequence into blocks of size d^. 

As we saw above, the probability that one pure noise block 
induces the joint type PbQ with c{'m) is ^ 2"'*«^('^^ll'^'^*). 
Because now the decoder does not know which codeword 
was sent, the MAP decoder fails with probability at least 1/2 
whenever one pure noise block lies in the union of the sets 
S(m, Q). From properties a-c, it follows that the probability 
that one single block produces an output contained in the union 
of §(TO,g) is \Qg\2-dBD(XY\\x,Y,)-o(dB) ^ Xhus, the number 
of blocks in the parsing that produce outputs contained in the 
union of §(m, Q) is a binomial random variable with mean 



A 



\Qg\2-'^BD{XY\\X,Y^)-o{dB) 



A 



\Qg\2-'^BD{XY\\X,Y^)-o{dB) 



because Q k, Q. Note that ( fTSI l implies that (Ib 
Assuming that cLb is sub-exponential in B and that 



n{B). 



(18) 



B{1 + /?) > dB{D{XY\\X, n) + 77) , 

it follows that the mean grows as 2*^ for some (5 > 0. As 
before, Chebyshev's inequality implies that with probability 
1 — 0(1), there is at least one pure noise block that produces an 



output contained in the union of S(m, Q). Therefore, the MAP 
decoder fails with probability at least 1/2 — o(l) whenever (fTST l 
holds for some 77 > and all B large enough. This implies 
(fTSl l. completing the proof of the converse for Theorem [T| 

We now prove the existence of a subset 6^ C Cs possess- 
ing properties a-c. 

Because Pb is defined as the type with the most codewords 
from Cb, and the number of types is polynomial in ds, we 
deduce that the code C^ defined as the set of codewords 
in Qb with type Pb satisfies properties a and h. The code 
&'g is defined as the largest subset of C'^ such that there 
exists a corresponding deterministic decoder with the property 
that the maximum error probability when 6^ is used over 
the synchronous channel Q is at most 2^'^^/'°siogrfB Sq 
defined, C^ satisfies properties h and c. Property h is clear. 
To see that property c holds, let 'D(c(m)) denote the decoding 
region associated with codeword c{m) £ 6^ — the decoding 
regions are well-defined and disjoint because the decoder is 
deterministic. Let P™ denote the probability distribution of 
the output of the synchronous channel Q when c{ra) is the 
channel input. Also, for a given Q ~ Q, let 'jQ[c(m)] denote 
the set of channel outputs inducing the joint type PbQ with 
c{m). Then, 

P™(Tq[c(to)] nD(c(m))) > P„(TQ[c(m)]) -2-'*-/i°s'°g'^« 

>(l-o(l))P„(TQ[c(m)]), 

where the first inequality follows from the definition of 
T){c{ra)) and the second inequality follows because Q ~ Q 
and ds = fl{B) implies that for all sufficiently large B, 
Pm{TQ[c{m)]) > 2-''s/Viogrfi3_ Xhus, conditioned on sending 
c(to) and conditioned on the output landing within TQ[c(m)], 
the probability of 'I){c{m)) is still at least 1 — o(l). Finally, 
this conditional distribution is uniform over TQ[c(m)])o so 
property c is satisfied by choosing §{m,Q) — T){c{m)) D 
TQ[c(m)]. 

It remains to verify property a, i.e., that there exists a 
large subset of 6^ such that there exists a corresponding 
(deterministic) decoder with the property that the maximum 
eiTor probability is at most 2"'^«/'°siogdB xhis follows 
immediately from ['2, Corollary 1.9, p. 107], since Gb, and 
therefore 6'^, has a small maximum error probability. 

There is a minor technicality in that Corollary 1.9 from 
|2| only explicitly shows that a large subset exists if we 
move from one constant probability of error e' to a smaller 
constant e", i.e., the corollary does not allow e" to depend on 
the blocklength ds, while we require e" = 2"''«/i°s'°s''b 
However, it can easily be verified that the arguments used to 
prove 12] Corollary 1.9, p. 107] remain valid even if we choose 
e" = 2-''«/i°s'°s'is Xhis concludes the proof. ■ 

Converses of Theorems^ The converse for Theorem|4]is 
very similar to the converse for Theorem [l] with a few minor 
differences due to the fact that ds is no longer subexponential 
in B. We now point to these differences. 

First, to prove ( fT4b we also use ( fT6] ), but now in combination 
with the fact that the number of codeword types grows not 

'''Because Q is a discrete memoryless channel. 



only polynomially with ds, but actually grows polynomially 
with B, because of the positive rate constraint. This is because 
even if the delay ds is exponential in B, there can be at most 
0{B) codeword positions with non-zero cost symbols. 

Second, concerning the analysis of the MAP decoder after 
parsing the entire output sequence in blocks of size ds ■ When 
ds grows exponentially with B, the dot-equalities used to 
analyze the MAP decoding error are no longer valid because 
dot equalities are defined with respect to B rather than ds- 
However, we can easily get around this problem by taking 
advantage of the fact that even if ds is exponential in B, to 
achieve a positive rate, each codeword must contain at most 
0{B) non-zero cost symbols. Therefore, instead of defining 
Q as the joint type between the message and the realization 
produced by the channel, we define Q to be the conditional 
distribution induced by the 0{B) non-zero cost symbols only. 
With this modification, the dot equalities are valid, and it is 
clear that the MAP decoder fails with probability at least .5 
whenever the noise produces a block inducing joint type PQ 
(over the 0{B) non-zero cost symbols). The rest of the proof 
proceeds as in the subexponential delay case. ■ 

Converse of Theorem^ For the converse of Theorem |5] 
one first defines a typical set of v's that captures 1 — o(l) of the 
probability, then applies the converse argument for Theorem[T] 



Proof of Theorem |2} Starting from Theorem |2] 

'^" X \E[k{X)y E[fe(X)](l+/3) 

A simple upper bound is 



This upper bound is obtained by choosing the input distribu- 
tion to maximize the second term in the minimum of ( fT9] l. To 
prove that this upper bound can be achieved, choose X to have 
a distribution with probability p to be • and probability 1 — p 
to be a*, with p -^ I. The first term in the min approaches: 

max 

X k(x) 

by Theorem 3 of |6|. The second term is 



1 



■ max 



/(^) 



1+P"'x" k{x)' 

as derived above (true actually for any p, not only p —^ 1). So 
the second term is smaller, and we are always limited by the 
timing uncertainty. This proves the desired result. ■ 
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C(/3) 



< 



max 

X 



1 



I{X;Y) + D{Y\\Y,) 
IE[/(^)] 

— max ;-- -r , 

- P X E[k{X)] ' 



(20) 
(21) 



where /(x) is the divergence between the distribution of Y 
conditional on X = x and the distribution of Y conditional 
on X = -k. 

Using the fact that 

a + h fab 

— — < max -,- 
c + a \c d 

we see that the above maximization is achieved for an input 
distribution with a point mass at a*, where 



argmax^ 



k{x) 



However, the maximizing solution is not unique. Since 

/W = k{^) = 0, 

pf{^) + {l-p)f{a*) _f{x) 
pk{-k) + (1 - p)k{a*) k{x) 

for any p G [0, 1]. Hence any input distribution with two point 
masses, one at • and one at a*, will do. Going back to ( |2TI) . 
we get 



1 + /3 



k{x) 



